feat: Refactor reindexing timeline generation and add a preview viewer api#19511
feat: Refactor reindexing timeline generation and add a preview viewer api#19511capistrant wants to merge 1 commit into
Conversation
|
|
||
| private static SegmentTimeline createTimeline(DateTime start, DateTime end) | ||
| { | ||
| final DataSegment segment = DataSegment.builder() |
FrankChen021
left a comment
There was a problem hiding this comment.
| Severity | Findings |
|---|---|
| P0 | 0 |
| P1 | 0 |
| P2 | 2 |
| P3 | 0 |
| Total | 2 |
| Severity | Findings |
|---|---|
| P0 | 0 |
| P1 | 0 |
| P2 | 2 |
| P3 | 0 |
| Total | 2 |
Reviewed 13 of 13 changed files.
This is an automated review by Codex GPT-5.5
| final IntervalPartitioningInfo originalInfo = searchIntervals.get(i); | ||
| final Interval originalInterval = originalInfo.getInterval(); | ||
|
|
||
| if (!originalInterval.overlaps(dataRangeWithSkipOffset)) { |
There was a problem hiding this comment.
[P2] Skip-offset-only intervals are mislabeled as no-data
The no-overlap branch also catches intervals that are entirely after dataRangeWithSkipOffset because of skipOffsetFromNow or skipOffsetFromLatest. Those intervals still overlap the live timeline, but they are recorded as SKIPPED_NO_DATA; ReindexingTimelineView filters that disposition out, so the preview hides intervals skipped only by the skip-offset boundary. Please distinguish live intervals beyond the truncation boundary and mark them SKIPPED_BEYOND_BOUNDARY.
There was a problem hiding this comment.
this seems valid, what if dataRangeWithSkipOffset is [Day0, DAY10] and searchInterval is [DAY12, DAY13]?
i also wonder what if this just start a reindex task anyway, if the interval is before dataRange, is anything horrible going to happen? Even if the interval is within dataRange, it's also possible timeline is empty in that specific interval.
I feel it'd be simpler if there's only skip offset, and search interval based on rules.
There was a problem hiding this comment.
Thanks, I agree the job path should not start a task for [DAY12, DAY13] in that example. The concern is the preview classification, not task creation. With live data through DAY13 and a skip-offset effective end at DAY10, [DAY12, DAY13] is data that is skipped because of the configured offset. ReindexingTimelineView.fromPlan emits SKIPPED_BEYOND_BOUNDARY but deliberately suppresses SKIPPED_NO_DATA, so labeling that interval as no-data hides the operator-visible skipped span from the API. Intervals before the live data range should still be SKIPPED_NO_DATA; intervals past the skip boundary but within the live timeline should be SKIPPED_BEYOND_BOUNDARY.
Reviewed 13 of 13 changed files.
| LOG.warn("No search intervals generated for dataSource[%s], no reindexing jobs will be created", dataSource); | ||
| return Collections.emptyList(); | ||
| } | ||
| final ReindexingPlan plan = new ReindexingPlanner(this).plan(jobParams.getScheduleStartTime(), timeline); |
There was a problem hiding this comment.
[P2] Job creation ignores planner validation errors
ReindexingPlanner.plan() now converts invalid granularity timelines into a PlanValidationError and an empty interval list for API preview, but the job creation path never checks plan.getValidationError(). Invalid cascading configs that previously failed job generation can now silently produce no jobs, masking a broken supervisor as an empty run. Please preserve the old failure or surface the validation error before returning jobs.
| @Override | ||
| public ReindexingTimelineView previewReindexingTimeline(SupervisorSpec spec, DateTime referenceTime) | ||
| { | ||
| if (!(spec instanceof CompactionSupervisorSpec)) { |
There was a problem hiding this comment.
nit: compactionSpec can go into this line
| } | ||
| final CompactionSupervisorSpec compactionSpec = (CompactionSupervisorSpec) spec; | ||
| final CompactionJobTemplate template = compactionSpec.getTemplate(); | ||
| if (!(template instanceof CascadingReindexingTemplate)) { |
There was a problem hiding this comment.
nit: cascadingTemplate can go into this line
| } | ||
|
|
||
| final ReindexingPlan.SkipOffsetResolution skipOffsetResolution = buildSkipOffsetResolution(timeline, referenceTime); | ||
| final Interval dataRangeWithSkipOffset = computeDataRangeWithSkipOffset(timeline, referenceTime); |
There was a problem hiding this comment.
it seems like this could be derived from skipOffsetResolution?
| * timeline — operators don't see a "configured but not applied" state because the scheduler | ||
| * short-circuits to an empty timeline when no segments exist. | ||
| */ | ||
| public static class SkipOffsetInfo |
There was a problem hiding this comment.
this is almost identical as SkipOffsetResolution, can we merge them?
| } | ||
|
|
||
| @Nullable | ||
| static ValidationError fromPlan(@Nullable ReindexingPlan.PlanValidationError error) |
There was a problem hiding this comment.
this is almost identical as ReindexingPlan.PlanValidationError, can we merge them?
| final Period skipFromNow = template.getSkipOffsetFromNowOrNull(); | ||
| final Period skipFromLatest = template.getSkipOffsetFromLatestOrNull(); | ||
|
|
||
| DateTime end = dataBounds.getEnd(); |
There was a problem hiding this comment.
so start is always the timeline start but end depends on offsets in template? also what if data end is before the skip offset, would we want a smaller interval?
| final IntervalPartitioningInfo originalInfo = searchIntervals.get(i); | ||
| final Interval originalInterval = originalInfo.getInterval(); | ||
|
|
||
| if (!originalInterval.overlaps(dataRangeWithSkipOffset)) { |
There was a problem hiding this comment.
this seems valid, what if dataRangeWithSkipOffset is [Day0, DAY10] and searchInterval is [DAY12, DAY13]?
i also wonder what if this just start a reindex task anyway, if the interval is before dataRange, is anything horrible going to happen? Even if the interval is within dataRange, it's also possible timeline is empty in that specific interval.
I feel it'd be simpler if there's only skip offset, and search interval based on rules.
| private final ValidationError validationError; | ||
|
|
||
| @JsonCreator | ||
| public ReindexingTimelineView( |
There was a problem hiding this comment.
ReindexingTimelineView sounds ambiguous, it's not like SegmentTimeline or ServerView. maybe ReindexingPlanResponse, WDYT?
Description
Replacement for #19051. This splits the server side code out from that and refactors it, to achieve the same goal in a hopefully more architecturally sound manner. @vogievetsky has volunteered to help with the UI side.
New Supervisor API (cascading reindexing supervisors only)
This generates a timeline of search intervals for reindexing with the effective sets of rules that are used to create the underlying inline compaction configs for the different intervals. This is the business end of the console UI described below
Reindexing Timeline Refactor
ReindexingPlannerandReindexingPlanare new classes that extract out the internals that take a datasource timeline, a rule provider (plus other reindex cascade config bits) and generate a set of sorted intervals that each apply reindexing rules to said interval.ReindexingTimelineViewis the public DTO for the new API and is what the UI in the console will be built to present the visualized reindexing timeline to users/operators.Release note
Adds a new compaction supervisor api for supervisors who use
reindexCompacttemplate.This API allows viewing of a timeline of intervals, each with what set of reindeixing rules that they will apply to the underlying interval.
Key changed/added classes in this PR
CascadingReindexingTemplateCompactionSchedulerOverlordCompactionSchedulerReindexingPlanReindexingPlannerReindexingTimelineViewSupervisorResourceThis PR has: